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© Method and apparatus for producing an abstract of a document. 



© A method and an apparatus for producing an 
abstract of a document capable of producing con- 
cise abstract with correct meaning precisely indica- 
tive of the content of the document automatically. 
The method includes the steps of: listing hint words 
which are preselected words indicative of presence 
of significant phrases that can reflect content of the 
document; searching all the hint words in the docu- 
ment: extracting sentences of the document in which 
any one of the listed hint words is found by the 
search; and producing an abstract for the document 
by juxtaposing the extracted sentences. An appara- 
tus for performing this method is also disclosed. 
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METHOD AND APPARATUS FOR PRODUCING AN ABSTRACT OF A DOCUMENT 



BACKGROUND OF THE INVENTION 



sources. 



Reld of the Invention 

The present Invention relates to a method and 
an apparatus for producing an abstract of a docu- 
ment from given document data. 



Despription of the Background Art 

In recent years, it has become fashionable to 
store a large amount of technical documents such 
as patent documents as files in a database. In such 
a database system, key words characterizing par- 
ticular technical fields for each documents are also 
registered for the sake of document search. 

However, in general, such key words alone are 
not sufficient to properly characterize documents. 
For this reason. It appears desirable to have a 
concise abstract summarizing each document for 
every one of the large amount of documents, but 
the number of the document usually defies prac- 
tical implementation. 

As a solution, there has been propositions for 
an automatic production of abstracts using a com- 
puter. 

In one of such propositions made by H.P.Luhn 
in "The Autonmatic Creation of Literature Ab- 
stracts" IBM J.Res.Dev. Vol.2. pp.l59-165. sen- 
tences in a document which contains words that 
appears frequently in that document are extracted 
from the document as an abstract of the document. 
This method is based on an assumption that impor- 
tant words appear frequently in a document. How- 
ever, frequently appearing words may not nec- 
essarily be precisely indicative of the content of the 
document, so that inappropriate abstracts are often 
obtained by this method. Moreover, the method 
has a drawback that, as the sentences with fre-, 
quently . appearing words arjB to be extracted, the 
number of sentences to be extracted also tends to 
become numerous, while a concise abstract is 
more desirable. 

In another proposition made by D.Fun, et al. in 
"Step toward the evaluation of text" 1JCAI85. 
pp. 840-844. an attempt has been made to evaluate 
the content of the document more properly so that 
an abstract with correct meaning can be obtained. 
However, the actual realization of such method still 
remains to be achieved. 

Thus, conventionally, it has been difficult to 
produce abstracts automatically, so that production 
of the abstracts actually relied on human re- 



SUMMARY OF THE INVENTION 
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It is therefore an object of the present invention 
to provide a method and an apparatus for produc- 
ing an abstract of a document capable of produc- 
ing concise abstract with correct meaning precisely 
70 indicative of the content of the document automati- 
cally. 

According to one aspect of the present inven- 
tion there is provided a method of producing an 
abstract for a document, comprising the steps of: 
15 listing hint words which are preselected words indi- 
cative of presence of significant phrases that can 
reflect content of the document; searching alt the 
. hint words in the document; extracting sentences of 
the document in which any one of the listed, hint 
20 words is found by the search; and producing an 
abstract for the document by juxtaposing the ex- 
tracted sentences. 

According to another aspect of the present 
invention there is provided an apparatus for pro- 
25 ducing an abstract for a document, comprising: 
• means for listing hint words which are preselected 
words indicative of presence of significant phrases 
that can reflect content of the document; means for 
searching all the hint words in the document; 
30 means for extracting sentences of the document in 
which any one of the listed hint words is found by 
the search; and means for producing an abstract 
for. the document by juxtaposing the extracted sen- 
tences. 

35 Other features and advantages of the present 

invention will become apparent from the following 
description taken in conjunction with the accom- 
panying drawings. 



40 



BRIEF DESCRIPTION OF THE DRAWINGS 



Fig. 1 is a block diagram of one embodiment 
of an apparatus for producing an abstract for a 
45 document according to the present invention. 

Fig. 2 is an Illustration of an example of a 
document for which an abstract is to be produced 
by the embodiment of Fig. 1. 

Fig. 3 is a diagrammatic illustration of a hint 
50 word dictionary to be used by the embodiment of 
Fig. 1. 

Figs. 4(A) and (B) are illustrations of the 
abstract produced by the embodiment of Fig. 1 at 
two different stage of producing the abstract. 

Fig. 5 is a block diagram of another embodi- 
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ment of an apparatus for producing an abstract for 
a document according to the present invention. 

Fig. 6 is a diagrannmatic illustration of a 
logical structure -memory to be used by the em- 
bodiment of Fig. 5. . , 

Fig. 7 is an exemplary result of morphologi- 
cal analysis to be used by the embodiment of Fig. 
5. 

Rgs. 8<A) and (B) are illustrations of the 
abstract produced by the embodiment of Fig. 1 at 
two different stage of producing the abstract. 

Rg. 9 is a flow chart for the abstract produc- 
tion by the embodiment of Rg. 5. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENTS 



Referring now to Fig. 1. there is shown one 
embodiment of an apparatus for producing an ab- 
stract of a document according to the present 
invention. 

In this embodiment, the apparatus compnses 
an input unit 1 from which document data and 
operator commands are to be entered, a document 
memory 2 for storing document data entered from 
the input unit 'l. a hint word dictionary 3 for stonng 
hint words, which are preselected words indicative 
of presence of significant phrases that can reflect 
content of the document data, an abstract produc- 
ing unit 4 for producing an abstract of the docu- 
ment data by extracting sentences containing the 
hint words and combining the extracted sentences 
in a manner to be explained in detail below, an 
output unit 5 for displaying and printing the pro- 
duced abstract along with the document data, and 
a control unit 6 for coordinating operations of the 
above mentioned elements of the apparatus. 

In this apparatus, the document data entered 
from the input unit l will be given to the control 
unit 6. and then temporarily stored in the document 
memory 2, and at the same time displayed at the 
output unit 5. as shown for an exemplary document 
in Rg 2. Here, the document data are displayed in 
unit of sentence which are labeled by sentence 
numbers, but this manner of displaying is not es- 
sential to the present invention. 

' Meanwhile, as mentioned above, the hint word 
dictionary 3 stores the hint words which are 
preselected words indicative of presence of signifi- 
cant phrases that can reflect content of the docu- 
ment data, in a manner shown in Rg. 3 for some 
examples of the hint words. As can be seen from 
Rg. 3. the hint words are grouped into a number of 
general categories such as 'developmenf. 'status . 
and 'method', with numbers of hint words belong- 
ing to each category. In addition, for each hint 
word, a part of speech to be given a higher pnority 
over other part of speeches as a usage of that hint 



word is also given. For instance, a hint word 
•develop' may arise as a verb 'develop' (or a part 
of its conjugations) or as a part of noun 
•development'. Here, the usage as a verb 'develop' 
s is to be given a higher priority as it is considered 
to be more indicative of presence of significant 
phrases that can reflect content of the document 
data. Accordingly, a symbol 'V is entered in a part 
of speech column for the hint word 'develop'. Like- 
10 wise, for those hint words for which the usage as a 
noun is to be given a higher priority, a symbol 'N* 
is given. Again, a particular manner of storing hint 
words shown in Rg. 3 is only meant to be an 
example and not essential to the present invention. 
,5 Now. the operation of abstract producing is 

can-ied out as follows. 

Rrst of all. when a command for production of 
abstract is entered from the input unit i. the ab- 
stract producing unit 4 will be activated by the 
20 control unit 6. and the production of abstract be- 
gins with reading of the document data stored in 
the document memory 2 by the abstract producing 

unit 4. . 

Next, the abstract producing unit 4 carries out 
25 searches of the hint words stored in the hint word 
dictionary 3 in the document data, and extracts all 
the sentences in the document data which involves 
any one of the hint words. The abstract producing 
unit 4 then produces an abstract by juxtaposing all 
30 the extracted sentences. 

The abstract so produced will then be transmit- 
ted through the control unit 6 to the output unit 5. 
at which it will be displayed, over the document 
data as shown in Rg. 4(A). for example. 
35 At this point, if desired, a command for indicat- 
ing information concerning these extracted sen- 
tences can be provided. When this command is 
given from the input unit 1. shades will be overlaid 
on the hint words used in extracting these ex- 
. 40 tracted sentences, and the sentence numbers of 
these extracted sentences are displayed, as sliown 
in Rg 4(B). so that an operator may perform 
editing of the produced abstract. For instance, 
when one of the extracted sentences in the ab- 
45 stract is considered unnecessary by the operator, 
that sentence may be deleted from the abstract. 
Such editing by an operator can be facilitated in 
any known manner. 

The abstract thus produced may further be 
so . stored either as a part of the document data or as 
a separate data. 

Thus according to this embodiment, it is possi- 
ble to produce a concise abstract of a document 
with a correct meaning precisely indicative of the 
55 content of the document automatically, as the ab- 
stract is produced from sentences in the document 
extracted by means of the hint words which are 
preselected words indicative of presence of signifi- 



3 



EP 0 361 464 A2 



6 



„„, pK,«os ma. can renec, corte.. 0. M ..ocu- 

sonlerc. gets ^'J'f^ °''„i„, ,„ «,e hint 

l'^-''r"'"r^r,„Xre„".^^ P»«» also 
rarnT::s^a;V««ot«.a,«d«e.en, 

m Zt^'e emoootment. v*en the teg- 

This drawback is recnTieu abstract of a 

„ent o( an apparatus loc P'°=lf' "=^^:„f t^„„on. 
— « 5 U events o, the 

r,rorrsr.'r,rn2^-an.^ 

^„^en,00<.h,.nt -~r;rctS1^ 

input unit . o, extta«eO '^J^J^^ „„„ihol09l- 
tan9ua,eanal,s»u..t8»^pert° 3^ ,„ ^, 

cat analys.s ot «» „„<j^|n9 unit 4. 

'Tan'"»s.ra« mo^ » ™r?o 

;t a"s«« pToduced by the abstract produang 

Z fin a manner to '^'^^^'^Zo.e. sttuc 
The logical structure memory f si 

indicaung S"-*^ J^"f as shown in Fig. 6, .or 
Chapters. level labels are also 

example. In F,g. 6. ^"'^j^^.g^chical structure, 

included ,n order to ^^^^^fj^^^ the present 
although this feature .s ob- 
invention. Such f^'^'^Z^od of analyzing 

tainpd, by^ using .ny ,,,, I,,., 

or entered from the 
input unit 1 by °P«;^°;^ , j,3, structure mem- 

ory 7 w.U be ^^J t^e hint words as 

4 in carrying out *e search .^^ 
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more likely to appear m ^J^'^"^^^^^^^, an ir^- 
document. such as a ^"l^'^^^'^/J.'^^^o be in a 
troducUon section ^belonging to 

V: ^T^::::^^^ nature such as 
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30 methods for such --^J^llXZ^^^^er^ce No. 8 
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CTe rxtracUTentences will appear as shown 

••"-riTc^^ 

,5 tion is given from the mput un. ^^^^^^ 

,,eted sentences are g^^-J° ^^^^ ,int 

Rcation unit 9. m ,„ modified as 

words with aster.sk --^•^'"^^j" ^^^^ .^^ 

follows. Namely, for a f^'"* sentence 
so ing Whose part of ^P^^^l^^^^l, predicate will 

..ich that ^IZ'jTeV,;^^^^^^^^^^ 
• be created and replaces ° ^ ^^j^t word 

tence. In ^^<^2VXor.To^^ ^^^^ °' 

which .s a ^e'-'' ^ be substituted for that 



EP 0 361 464 A2 



8 



fhrsSToi'. a command reoaramg «n,m^ 
M logical structure ™'"7/ Can U-nag 

snr;r.::rra;oSae.e.-.agis 

set .0 oat the ^'03. 

.Herto^'s:^-^'".^^^^^^^^^^^^^ 

using me language »»'Xts .hTn an Mag 
rr»r.re'rS'o."e:::ae.eP.«agla 

'"■^n^Trrfo. ..a 

whether U-llag « 0 ° ™' ''^"^Seeda to the 
ItO next, othenmse the process proc 

^^^^XC rCs'founTto be 0 at the step 
^ I nf the hint words in the hint word 
110. a search f ^^^f ^'^^^^^^ce is carried out at 
dictionary 3 m the I- h se"^®"*; proceeds to 
the step 111. otherwise, the process pro 

,s. carried out b, the '^"^"^J^^l "„?'U=h hint 

word found by the lang y ^^^^ ^.^^ 

compared with the ^^^^^^^^^^^^^ they co- 
word dictionary 3 to '^ff/"''"^ « yes. the I- 
incide with each other at the step 1 ye 
th sentence is extracted as ^ Pf^^^^^^^^^^ loa 

tne step 115 -^^^l^^^J^'t^J^; eturr^sJo the- 
jabov^. otherwise the 0rocess simj^.y 

* step 108 above. «h to he 1 at the step 

^;S^e of me Kh eLnce stored 
110. the logical attnbute ot me . v 

returns to the step « . ^„.^„ce is one of a 

-trt^:.eran";^p:aor.-^ 

""^^JhrrSt^^^^^^^^ . tound to be one of a 



Chapter header and a P-g-P'^f ^^^^^ fi^*^:,;::. 
. 118, then process returns to t^^^^^^^^ 

ST^a^trr;— ^^^^ 

r Tblfow'Tat 

rgtiesTdLToUnt' and '-^X^^, 

le included in the Kh ?B^,,^^r^^^^^^^ 
,5 the process returns to the step 1 0B a 

wise the Pr°«««:«»"'"i*° '*^! Sd to be neither 
When the t-th sentence .s ^° ^9 

in the f.rst chapter nor --^r/ J J*^e s^^^ 

whether any hint I'^'^^^g included in 

,0 . gories of 'developmenf ^staU.s is mciu 

le Kh sentence is ^ above. . 

^= r^^rthrsteT^arr^ar^^^^^^^^ 

rrr^e^rtSe abstract -di«c^^^^^^^^^^^ 
the abstract n,odificat,on unt 9 s g-ven y^ 
operator from the input unit 1 ^ ^ 

30 extracted sentence ""'^'=f ' J^*° to 0 at the 
each of the extracted sentences ^ , ^4. 

step 123 and is increased by 1 at the s p 
otherwise the process te^^^^ the J-th ex- 

Then at the step 1^^-^ determined at the 
3s tracted sentence ex'S^ °; .^^ ^^^^^.ted sen- 
step 125. If yes. '"^^^^^^li^se^.erxoi speech 

-a. otherwise the 

^ ^^°^;errh extracted sen^^^^^^^^^^^ 

contain - hint word wh- ^^JJ^:::;,^,,^ . 

the contained •^•nt words whose Parts °tj^^^,^„,, 
verbs are predicatesof the J th extr^^ 
or not is determined ^"Jj,^,e otherwise a 

process Scted sentence 

50 passage or a phrase ^ m 

that contains the hint woro gdicates is 

speech are -^^^^ -jj^^/,^ ^ p edSte'for the J-th 
extracted and "^o^'''®! ,9 
extracted ™,:;r30^^^^^^^^^ the subject of 

rb^erraror^^e^- 



EP 0 361 464 A2 



10 



logical attribute of •affiliation' is ^^k^n as a sub.ec^ 
for the J-th extracted ser^tence at the step 131 and 
the process proceeds to the step 132 

At the step 132. an original predicate of the J 
th e^cL sentence is co-ertedjmo e^^^^^^^^^^^ 
of adiective or adverb at the step 132. resulting 
: ntceTproduced as a '^o^^^fXr:.T^e 
a part of the abstract at the step 133. and the 
process returns to the step 1 24 above. 

It is to be noted that the distinct benefit of each 
of the logical structure memory 7, language analy- 
s s un t 8. and abstract modification unit 9 m the 
above embodiment can be obtained ;"f°'P°;^; 
fng 'st one of these elements without the other 
two so that incorporation of all three as m the 
love embodiment is not absolutely necessary. 

scator, um. 9 is omitted. tt» steps ia2-t33 in tite 

flow chart of Fig. 9 can be omitted. 
"°"Bes'des these, many modifications and ven- 
ations of the above embodiments may be made 
troutlparting from the novel and advarUa^^^^^^ 
feature of the present invention. Accordingly, an 
such modif cations and variations are intended to 
be included within the scope of the appended 
claims. 



Claims 



1. A method of producing an abstract for a 
document, comprising the steps of: 
^s'g hint words which are Preselecjed words ,nd.- 
„tive of presence of significant phrases that can 
reflect content of the document; 
searching (107-109. U^) all the hmt words m the 

eractfng" (115) sentences of the document in 
whinny.one of the listed" hint words is found by 

roduC^aTabstractfor the document by .x- 
taoosina the extracted sentences. 
''' 2 The method of claim 1. further compr^ing 
the step (101-103. 110. 116-121) of listing struc- 
TuV attribute for each sentence of the docum^^^^^^^ 
and wherein the search of some of the ^mt words 

of the Nnt words and the structural attributes of the 

''T'-^'e method of claim 1. further comprising 
the steps (104-106. 112-114) of. 



listing most preferable part of speech for each hint 

word; and • 
morphologically analyzing the sentences of the 

3 and" whe^lin the extraction of the sentences is 
Smited to those sentences in which any one of the 
SSed hint words that is found by the search is m 
the listed type of part of speech. 

4 The method of claim 1. further compns.ng 
♦h« 5teD (122-133) of modifying the extracted sen- 
" ences such that for a hint word that is found with 
hHsted type of part of speech which is a verb a 
new sentence in which that hint word appears as a 
p^dicate will be created and -places the angina, 
,c extracted sentence, while when a subject of the 
exacted sentence is a pronoun or ^^^ing. an 
appropriate sentence will be substituted for that 
toZn. whereas for a hint word that is found with 
SSTsted type of part of speech -^^f^^^J^'^^ I 
new sentence in which that hint word appears as a 
" Xct In, be created and replaces the ong.na. 

pvtracted sentence. . 

5 The method of claim 1. further oornp^^^^^ 
the step of allowing an operator to edit the pro 
« duced f^^l^l^^^^ an abstract for a 

'::Zt tor^Snrhint words which are prese.ec- 
S words indicative of presence of significant 
30 P-ases that can reflect con^^^^^^^^^ 

tT^eans (4) for searching all the nmi woro 

meaTsT^^ for extracting sentences of the document 
Twhfch any one of the listed hint words is found 

STe^s (C; p"-ing an abstract^ ^^^^^^^^^^ docu- 

-r^hr:^rorcr6^™ris^^^^ 

J^Jm for listing structural attribute for each 
InteJcl of t|!^e document: and wherein the search 
" of some 0° the htt words are performed for only a 
?ractirof all the sentences of the document, m 
accordance with nature of the hint words and the 

" means (31 foresting most preferable part of speech 
rear (sTrlrhologlcally analyzing the sen- 
3„ r^htrnr— ofthesen^ 

Lited to those sentences in -^-^'^^^XZn X 
listed hint words that is found by the search is 
ihP listed type of part of speech. 

9 Thl apparatus Of claim 6. further compns».g 
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theTsted type of part of speech which .s a noun, a 
new Intence in which that hint word aPPears - a 
subject will be created and replaces the ong.nal 

^'^^rTh^nr^-ofc.ai.e.fu^^^^^^^^^^ 'o 

ing means (D for allowing an operator to ed.t the 
produced abstract. 
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FIG.8 (A) 
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ABSTRACT (EXTRACTED SENTENCES) 

VOICE— CONTROLLABLE WRIST WATCH 

TADASHI SATO 
OOCLOCK,CO 

IN A VOICE-CONTROLLABLE WRIST WATCH DEVELOPED 
THIS TIME, ALL THE OPERATIONS BY SWITCHES IN A CONVENTIONAL 
WRIST WATCH HAVE BEEN REPLACEb BY THE OPERATIONS 
BY HUMAN VOICE. 

FOR THE METHOD OF RECOGNITION, THERE IS—- 



FIG.8 (B) 



ABSTRACT (PRODUCED) ■ 

OOCLOCK CO. DEVELOPED A VOICE-CONTROLLABLE 
WRIST WATCH THAT CAN BE OPERATED BY 
HUMAN VOICE. 
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